The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

نویسندگان

  • Pragya Sur
  • Yuxin Chen
  • Emmanuel J. Candès
چکیده

Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test (LRT). Indeed, Wilks’ theorem asserts that whenever we have a fixed number p of variables, twice the log-likelihood ratio (LLR) 2Λ is distributed as a χk variable in the limit of large sample sizes n; here, χk is a chi-square with k degrees of freedom and k the number of variables being tested. In this paper, we prove that when p is not negligible compared to n, Wilks’ theorem does not hold and that the chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that n and p grow large in such a way that p/n→ κ for some constant κ < 1/2. (For κ > 1/2, 2Λ P → 0 so that the LRT is not interesting in this regime.) We prove that for a class of logistic models, the LLR converges to a rescaled chi-square, namely, 2Λ d → α(κ)χk, where the scaling factor α(κ) is greater than one as soon as the dimensionality ratio κ is positive. Hence, the LLR is larger than classically assumed. For instance, when κ = 0.3, α(κ) ≈ 1.5. In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, from non-asymptotic random matrix theory and from convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supplemental Materials for : “ The Likelihood Ratio Test in High - Dimensional Logistic Regression Is Asymptotically a Rescaled Chi - Square ”

This document presents the proof of Lemma 6(ii) given in the paper [1]: “The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square”. 1 Proof of Lemma 6(ii) We shall prove that V(τ) < τ whenever τ is sufficiently large. Before proceeding, we recall from the main text and [2, Proposition 6.4] that V(τ) := 1 κ E [ Ψ(τZ; b(τ)) ] = 1 κ E [( b(τ)ρ′ ( pr...

متن کامل

THE LIKELIHOOD RATIO TEST IN HIGH-DIMENSIONAL LOGISTIC REGRESSION IS ASYMPTOTICALLY A RESCALED CHI-SQUARE By

Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks’ theorem asserts that whenever we...

متن کامل

Empirical likelihood inference for regression model of mean quality-adjusted lifetime with censored data

The authors consider the empirical likelihood method for the regression model of mean qualityadjusted lifetime with right censoring. They show that an empirical log-likelihood ratio for the vector of the regression parameters is asymptotically a weighted sum of independent chi-square distributions. They obtain an adjusted empirical log-likelihood ratio which is asymptotically standard chi-squar...

متن کامل

Systematic Approach for Portmanteau Tests in View of Whittle Likelihood Ratio

Box and Pierce (1970) proposed a test statistic TBP which is the squared sum of m sample autocorrelations of the estimated residual process of autoregressive-moving average model of order (p,q). TBP is called the classical portmanteau test. Under the null hypothesis that the autoregressive-moving average model of order (p,q) is adequate, they suggested that the distribution of TBP is approximat...

متن کامل

Empirical Likelihood for Nonparametric Additive Models

Nonparametric additive modeling is a fundamental tool for statistical data analysis which allows flexible functional forms for conditional mean or quantile functions but avoids the curse of dimensionality for fully nonparametric methods induced by high-dimensional covariates. This paper proposes empirical likelihood-based inference methods for unknown functions in three types of nonparametric a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1706.01191  شماره 

صفحات  -

تاریخ انتشار 2017